Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Arangodb to new provider structure #46124

Closed

Conversation

Prab-27
Copy link
Contributor

@Prab-27 Prab-27 commented Jan 27, 2025

Related: #46045


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk potiuk force-pushed the move-arangodb-to-new-provider-strcucture branch from f78dd50 to baf13ab Compare January 27, 2025 19:02
@potiuk
Copy link
Member

potiuk commented Jan 27, 2025

Rebased it after fixing the http provider issue

@potiuk potiuk force-pushed the move-arangodb-to-new-provider-strcucture branch from baf13ab to 6115777 Compare January 30, 2025 08:39
@potiuk
Copy link
Member

potiuk commented Jan 30, 2025

One link to moved example needs to be fixed @Prab-27

@potiuk
Copy link
Member

potiuk commented Jan 30, 2025

And conflicts to resolve now :(

@Prab-27
Copy link
Contributor Author

Prab-27 commented Jan 30, 2025

And conflicts to resolve now :(

Sorry, I used GitHub workflow to resolve the conflict. I should have manually done a rebase
Should I revert this changes ?

``python-arango`` ``>=7.3.2``
================== ==================

The changelog for the provider package can be found in the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk
Copy link
Member

potiuk commented Jan 30, 2025

No - you neeed to localy rebase and resolve conflicts (and run pre-coomit etc.).. But might be easier to do it again

o-nikolas and others added 12 commits February 1, 2025 00:18
* Enable scheduled_dttm field in task_instance

* Fixed unit tests

* Fixed Unit test

* Removed comment

* Fixed unit tests

---------

Co-authored-by: Marco Küttelwesch <[email protected]>
* Add shortcut key support for search dags.

* Display meta key for respective OS.
* Migrate Apache Drill Provider Package to new Structure

* Fix Typo

Co-authored-by: Jens Scheffler <[email protected]>

---------

Co-authored-by: Jens Scheffler <[email protected]>
Where possible in dag processor, replace references to fileloc with bundle name + relative fileloc.  This is a stepping stone on the way to removing the fileloc attr (or perhaps we change it to mean relative and remove relative as a distinct field).  One place I could not update in this PR was due to needing to update ParseImportError, and I did not want to overcomplicate this PR.
In a few places I also try to make the code a little more intelligible without changing its behavior.
dauinh and others added 15 commits February 1, 2025 00:18
* rebasing

* fix regex

* remove pnpm from package.json
…pache#46310)

Since each job is producing the same content we want the key to be the same,
but we don't need to update it from both -- that is wasteful, and worse, prone
to race condotions and 409 HTTP errors.

Rather than ignore all errors by specifying `continue-on-error: true` in the
step, we choose to instead only upload from one matrix
Some points of note about this change:

- Logging is changed in Celery, but only for Airflow 3

  Celery does it's own "capture stdout" logging, which conflicts with the ones
  we do in the TaskSDK, so we disable that; but to not change anything for
  Airflow 3.

- Simplify task SDK logging redirection

  As part of this discovery that Celery captures stdout/stderr itself (and
  before disabling that) I discovered a simpler way to re-open the
  stdin/out/err so that the implementation needs fewer/no special casing.

- Make JSON task logs more readable by giving them a consistent/useful order

  We re-order (by re-creating) the event_dict so that timestamp, level, and
  then even are always the first items in the dict

- Makes the CeleryExecutor understand the concept of "workloads" instead a
  command tuple.

  This change isn't done in the best way, but until Kube executor is swapped
  over (and possibly the other in-tree executors, such as ECS) we need to
  support both styles concurrently.

  The change should be done in such a way that the provider still works with
  Airflow v2, if it's running on that version.

- Upgrade Celery

  This turned out to not be 100% necessary but it does fix some deprecation
  warnings when running on Python 3.12

- Ensure that the forked process in TaskSDK _never ever_ exits

  Again, this isn't possible usually, but since the setup step of `_fork_main`
  died, it didn't call `os._exit()`, and was caught further up, which meant
  the process stayed alive as it never closed the sockets properly. We put and
  extra safety try/except block in place to catch that

I have not yet included a newsfragment for changing the executor interface as
the old style is _currently_ still supported.
This was using `constraints-{AIRFLOW_BRANCH}`, which could be e.g.
`constraints-v3-0-test` outside of main, where it should be
`constraints-3-0`, which is already defined in
`DEFAULT_AIRFLOW_CONSTRAINTS_BRANCH`.
* Add autorefresh to dag page

* remove areActiveRuns check from task list card

* Fix dag header toggle pause

* Check if dag is paused, move to custom hook

* Address PR feedback
In order to release a usable alpha build of apache-airflow ("core") we'll need
an alpha version of the task-sdk to go along with it.
…Interface (apache#46319)

This is needed so that we can have the scheduler create the expand TIs for the
downstream tasks.

Note, that we don't enforce that this is set when it needs to be on the server
side, as the only way for us to know if we need to or not (and conversely,
when it's set when it _doesn't_ need to, which is no effect other than
creating a small DB row that nothing will ever read) as doing that would
involve loading the serialized DAG to walk the structure which is a relatively
expensive operation.

We could improve that by "pre-computing" some of the info on what tasks are
actually mapped or not before serialization so we wouldn't have to walk the
task groups to find out, but that wouldn't do anything about the need to load
the serialized DAG which is the expensive part.

If this turns out to be a problem we can revisit the decision not to enforce
this later.
* Add autorefresh to dags list page

* Drop changes to toggle pause

* Fix toggle paused again
Older versions of apache-beam package depend upon gprcio-tools, but the
version requirement they specify doesn't work on Py 3.12, but the metadata
doesn't say that, so it tries to compile it and fails.

This updates the min version to the min supported version with wheels for 3.12
@Prab-27
Copy link
Contributor Author

Prab-27 commented Jan 31, 2025

Fine ?
Should I need close this PR and raise another PR ?

@Prab-27
Copy link
Contributor Author

Prab-27 commented Jan 31, 2025

Sorry I accidently made mistake
will raise another PR

@Prab-27 Prab-27 closed this Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.